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Abstract 


Fire Blight (FB) is the most destructive bacterial disease of pome fruit trees around the world. In 
recent years, spectrometry has been shown to be an accurate and real-time sensing technology for 
plant disease detection. So, the main objective of this research is early detecting FB of pear trees by 
using Visible-Near-infrared spectrometry. To get this goal, the reflectance spectra of healthy leaves 
(ND), non-symptomatic (NS), and symptomatic diseased leaves (SY) were captured in the visible— 
NIR spectral regions. In order to keep the important information of spectra and reduce the dimension 
of data, three linear and non-linear manifold-based learning techniques were applied such as, Principal 
Component Analysis (PCA), Sammon mapping and Multilayer auto-encoder (MAE). The output of 
manifold-based learning techniques was used as an input of the SIMCA (Soft independent modeling 
by class analogy) classification model to discriminate NS and ND leaves. Based on the results, the best 
classification accuracy obtained by using PCA on the 1" derivative spectra, with accuracy of 95.8%, 
89.3%, and 91.6% for ND, NS, and SY samples, respectively. These results support the capability of 
manifold-based learning techniques for early detection of FB via spectrometry method. 
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Introduction 


At least 10% of global food production is 
lost due to plant disease (Zhang et al., 2012). 
Fire blight (FB) is the most destructive 
bacterial disease of apple (Malus domestica), 
pear (Pyrus communis) and more generally of 
Maloideae, a subfamily of the Rosaceae. The 
causal agent is the necrogenic Gram-negative 
bacterium Erwinia amylovora (Ea). This 
pathogen enters the plant through natural 
openings such as the apoplast of parenchyma 
cells and colonize nectarthodes or through 
wounds on succulent aerial parts. Once inside 
the susceptible host plant, the bacteria multiply 
mainly in actively growing shoots inducing the 
progressive necrosis of the infected plant 
tissues. In resistant host plants or in non-host 
plants, bacteria cause a local cell death 
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(hypersensitive-like reaction) and are unable to 
further colonize the plant tissue (Gaucher et 
al., 2013). To decrease product losses, fast and 
timely identification of FB disease is very 
important (Bagheri et al., 2018). Scouting is 
normally used as a method for FB detection, 
which is time consuming and laborious. Thus, 
an accurate and real-time sensing technology 
for improvement of plant disease detection is 
necessary (Futch ef al., 2009). Nowadays, 
spectrometry in the Visible and Near-Infrared 
(NIR) spectral ranges shows good potential for 
detection of plant disease and stress (Sankaran 
et al., 2010; Bagheri et al., 2018). So, various 
researchers (Delalieux et al., 2007; Purcell et 
al., 2009; Spinelli et al., 2006; Yang et al., 
2007) have used spectral reflectance-based 
techniques for plant disease detection (Purcell 
et al., 2009). Bravo et al. (2003) and Moshou 
et al. (2004) developed a_ ground-based 
spectral data collecting system for disease 
detection in winter wheat fields, which 
achieved a classification accuracy of over 90% 
(Bravo et al., 2003; Moshou eft al., 2004). 
Naidu et al. (2009) applied visible infrared 
spectrometry (350-2500 nm) for detecting 
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grapevine leafroll disease. It was reported that 
the classification accuracy based on stepwise 
discriminant analysis ranged from %73 to %81 
depending on the features (vegetative indices) 
used for detecting infected (symptomatic and 
non-symptomatic) and healthy leaves (Naidu 
et al., 2009). Lui et al. (2010) applied neural 
network and PCA _ (Principle Component 
Analysis) techniques to discriminate different 
fungal infection levels in rice panicles with a 
portable spectroradiometer in the laboratory 
(350 to 2500 nm). Results indicated that it is 
possible to discriminate different fungal 
infection levels of rice panicles under 
laboratory conditions using spectrometry data 
(Lui et al., 2010). Zhang et al. (2012) detected 
powdery mildew of wheat via a 
spectroradiometer in a laboratory. Based on 
the results, the PLSR (Partial Least Square 
Regression) model with a coefficient of 
determination (R’) of 0.80 was good for 
estimating disease severity and FLDA (Fisher 
Linear Discriminant Analysis) with accuracy 
over 90% was produced for the heavily- 
damaged leaves. Mahlein et al. (2013) 
developed specific spectral disease indices for 
detection of three leaf diseases, Cercospora 
leaf spot, sugar beet rust and powdery mildew 
in sugar beet plants. The classification results 
for discrimination of healthy leaves from 
infected with Cercospora leaf spot, sugar beet 
rust and powdery mildew leaves were obtained 
as 92%, 87%, and 85%, respectively (Mahlein 
et al., 2013). Yuan ef al. (2013) analyzed 
spectral data of winter wheat leaves for 
detection of yellow rust disease. The spectral 
differences showed a stronger response in 380- 
650 nm for both healthy and diseased leaves at 
the leaf scale. Phadikar et al (2013) classified 
different types of rice diseases by extracting 
features from the infected regions of the rice 
plant images. To reduce complexity of the 
classifier, important features were selected 
using rough set theory (RST) to minimize the 
loss of information. Finally, using selected 
features, a rule base classifier was built that 
covered all the diseased rice plant images and 
provided superior result compare to traditional 
classifiers. Finally, ten-fold cross validation 


was performed to measure the efficiency of the 
proposed method, which showed superiority 
over other methods (Phadikar et al., 2013). 
Barbedo et al. (2015) presented an algorithm 
for automatic detection of Fusarium head 
blight in wheat kernels using hyperspectral 
imaging. With classification accuracy above 
91%, the developed algorithm was robust to 
factors such as shape, orientation, shadowing 
and clustering of kernels. Huang et al (2015) 
proposed a new method for grading panicle 
blast based on hyperspectral imaging. The 
method was based on the concept of the ‘‘bag 
of textons”, which defines a ‘‘bag of spectra 
words” (BoSW) model for hyperspectral 
image data representation. The results 
indicated that the proposed method could 
effectively grade panicle’ blast with 
classification accuracies of up to 81.4% for 
six-class grading and 96.4% for two-class 
grading in the validation datasets (Huang et 
al., 2015). 

Based on literature review, no research has 
carried out yet for early detection of FB by 
spectral data. Because of the necessity of 
developing a system for early detection of FB 
disease, in the present research, Visible-NIR 
spectrometry method was used as a non- 
destructive method for detection of FB in pear 
trees during growing. For dimensionality 
reduction, several spectral preprocessing 
techniques and three manifold-based learning 
methods were applied on the spectra for more 
accurate detection of NS infected leaves. In 
order to extract an identification algorithm, the 
performance of preprocessed reflectance 
spectra and the dimensionality reduction 
methods were evaluated in affected trees at 
early stages. 


Materials and Methods 


Pear leave samples 

Pear leaves of healthy and infected trees 
were collected from a 5 ha pear tree orchard at 
Damavand city of Tehran Province in Iran on 
May 2016. Collecting leaves was carried out 
under clear sky conditions between 10:00 am 
to 14:00 pm. 34 healthy trees (ND), 50 
symptomatic infected trees (SY) and 22 non- 
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symptomatic infected trees (NS) were selected 
(Fig.1). 

After collecting leaves, samples were 
packed with different plastic bags and 
transported immediately to a nearby indoor 
laboratory for spectral measurements. All 


leaves were tested in the laboratory by 
Selective Culture Method to confirm presence 
of Erwinia amylovora bacteria in samples. In 
this laboratory test, all samples were washed in 
tap water. 


Infected tissues were surface sterilized by 
immersion in 10% household bleach for 3 min 
and rinsed twice in sterile distilled water 
(SDW) for a few minutes. Leaf samples were 
each macerated in a few drops of SDW in a 
sterile glass Petri dish using a sterile scalpel 


SY 
Fig.1. Images of ND, NS, and SY trees 


and forceps. Thirty minutes after maceration, 
30 u L of macerated tissue were streaked onto 
King’s medium agar B (KB). The plates were 
then incubated at 27°C for 2-3 days and 
observed daily for bacterial growth. Suspected 
colonies of Erwinia amylovora (white, 
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circular, mucoid, and curved) were selected 
and further purified on KB agar at 27°C (King 
et al., 1954). 
Spectral data acquisition 

A portable high-resolution fiber-optic 
spectrometer (Avaspec 3648, Netherland) in 
the range of 200 to 1100 nm with a resolution 
of 0.05-20 nm was used to collect the spectral 
reflectance data of pear leaves under 
laboratory conditions. The dark spectrum was 
obtained by turning off the light source (Dep 
UV, 78W / 0.75A, Dimension: 315 x 165 x 
140 mm/weight: 5 kg AVAntes. Netherland) 
and covering the tip of the fiber-optics 
reflectance probe (7 fibers 200 mm or 400 mm 
core, 6 light-fibers, 1 read fiber, N.A.= 0.22. 
Standard 2 m length, splitting point in the 
middle. AVAntes. Netherland) completely. 
The reference spectrum (Rreference) Was 
measured by turning the light source on and 
placing the probe in the front of a reference 
tile. Then the sample spectrum (Ryampic) Was 
measured to calculate the relative spectrum by 
following equation: 


Rsample—Rdark 
Rretatuve —~R =p x 100 (1) 
reference dark 


Due to the high level of noise in the 200- 
400 nm and 1000-1100 nm, further analysis 
was performed only on the spectral data in the 
range of 400-1000 nm. The - spectral 
measurements were taken 5 times for each 
sample. Data of ND, SY, and NS leaf samples 
was collected and recorded. The absorbance at 
a certain wavelength (A) was calculated as [log 
(1/Reflectance)|] at this wavelength (A) based 
on Beer Lambert law. 

Chemo metric analysis 

All data analysis and feature extraction 
were performed in MATLAB (version 201 1b, 
Math work, Inc. USA). Preprocessing methods 
and plot visualizations of features were carried 
out in Unscrambler Software (Version X10; 
CAMO Software, Oslo, Norway). For each 
sample, three measurements with two 
replications were carried out and mean value 
was obtained for each concentration. Two of 
the most commonly used _ scatter-correction 
techniques in spectroscopy, Multiplicative 
Scatter Correction (MSC) (Geladi et al., 1985) 
and Standard Normal Variate (SNV) (Barnes 


et al., 1989) were used. MSC aims to reduce 
the scattering effects by fitting each spectrum 
to a reference spectrum, which usually 
corresponds to the mean spectrum of the data 
set. Each spectrum is fitted by linear least 
square regression. The first derivative filtering 
based on Savitzky-Golay was used to remove 
offsets (Savitzky and Golay, 1964). SIMCA 
classification accuracy of these preprocessing 
methods is shown in Table 1. To detect ND, 
NS and SY, samples, spectral variables 
extracted from the pre-processed spectra were 
used directly as inputs of classification 
algorithms. Mentioned manifold-based 
learning techniques were applied to reduce the 
magnitude of the high-dimensional spectral 
variables space in order to improve the 
performance of the classifier (Tian, 2010). 

Most of the time, measured data have high- 
dimensional vectors while terse and precise 
manifold data is required (Ghodsi, 2006). 
Therefore, it is necessary to protect the least 
vital parameters that describe all data 
information and manifold learning methods 
and also remove additional data. Hereon, three 
more common unsupervised dimensionality 
reduction methods were used: Principal 
Component Analysis (PCA) (Jolliffe, 2002), 
Sammon mapping (Sammon, 1969), and 
Multilayer auto-encoders (MAE) (Demers and 
Cottrell, 1993). The number of features were 
applied for these dimensionality reduction 
methods was between four to seven. In order 
to establish the same condition, four features 
were selected. 

Multilayer auto-encoders are feed-forward 
neural networks with an odd number of hidden 
layers (Demers and Cottrell, 1993) which 
share weights between the top and bottom 
layers (although asymmetric network 
structures may be employed as well). In PCA, 
the data transformation is linear, while 
Sammon mapping and MAE are non-linear 
methods. PCA is one of the classical methods 
in dimensional reduction. PCA is also known 
as the Karhunen Loéve transform, or singular 
value decomposition (SVD). The key idea of 
PCA is to find the low-dimensional linear 
subspace which captures the maximum 
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proportion of the variation within the data. 
According to Lorente ef al. (2015) and Lee 
and Verleysen (2007) research, PCA and 
Sammon mapping were suitable for the small 
data sets used in this work, since the simplest 
models of manifold learning methods are more 
acceptable for small data sets with high- 
dimensionality (Lee and Verleysen, 2007; 
Lorente et al., 2015). Furthermore, MAE as a 
nonlinear method was applied to support the 
results of two other manifold leaning 
techniques. The application of MAE in plant 
disease detection has not been reported yet, 
while neural networks are more common 
(Laurindo et al., 2017; Liu et al., 2010). The 
goal of SIMCA as a classification method is to 
get an organization rule for a set of m known 
groups, thus it is used to distinguish m classes 
where the similarity within a class is 
emphasized (Vanden Braden and Hubert, 
2005). In SIMCA algorithm, the format of 
training samples was described by 
dimensionality reduction techniques (PCA, 
Sammon mapping and MAE) for each class. 
The original SIMCA and modifications by 
Hawkins (Mahalanobis distance) and 
Gnanadesikan methods were compared with 
respect to classification accuracy and _ their 
robustness towards the number of PCs selected 
to describe the different classes. SIMCA 
modified with the Mahalanobis distance 
method was found to be a good alternative to 
the original SIMCA which seems to be more 
robust for finding outliers when the exact 
number of PCs for building the model is not 
known (De Maesschalck et al., 1999). 

Leave one out cross-validation (LOOCV) 
was performed to evaluate and compare the 
performance of the classification models. 80 
percent of data was used for calibration and 20 
percent of data was used for validation. Five 
iterations were applied and 20% of data left 
out each time. A mean confusion matrix was 
created as an average of all iterations. For 
evaluating the classification performance, 
overall accuracy was computed for each 
classification model from its associated mean 
confusion matrix (Fleiss, 1981). 


Results and Discussion 


Development of classification models 

According to the algorithm of SIMCA, all 
acquired data were classified into healthy and 
diseased classes. The combination of spectral 
ranges (visible-short NIR), the pre-processing 
methods (no pre-processing, SG, MSC and 
SNV), and the dimensionality reduction 
techniques (PCA, Sammon mapping and 
MAE) were evaluated with regards to their 
classification performance in detecting NS 
leaves. The results of the SIMCA classifier on 
the visible-NIR spectra with applying the 
different preprocessing techniques and three 
dimensionality reduction methods show in 
Table 1. The accuracy of SIMCA and other 
linear classification methods has confirmed 
early bruises detection of apples in different 
regions of spectra (Baranowski ef al., 2012) 
and pistachios classification (Vitale et al., 
2013). The most accurate SIMCA model was 
the PCA manifold, with 95.8% accuracy on 
the derivative spectra. As a whole, the 1“ 
derivative spectra showed the most accurate 
classification results in comparison with MSC 
and SNV methods. According to Lorente et al. 
(2015) research, the best classification results 
belonged to the raw spectra’ without 
preprocessing with small differences to SNV 
and MSC methods (Lorente et al., 2015). This 
is because the scatter-correction methods only 
remove the structural and physical variations 
of spectra, while they save the absorption 
properties related to the chemical components 
in the spectra. It should be noted that the 1* 
derivative preprocessing method was _ not 
evaluated with them. In consequence, MSC 
and SNV probably removed some important 
information for FB detection from the spectral 
measurements, as both try to fit spectra to the 
mean of spectrum. The total accuracy of 
dimensional reduction methods was almost 
higher than each class. Since the maximum 
number of samples belong to SY, then this 
group had the most weight in the total result. 
In order to display the classification of NS 
group, the cooman’s plot was also drawn in 
every dual group. The vertical and horizontal 
axis of cooman’s plot showed two classes and 
the SIMCA critical distance (0.95) from each 
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class. In this study, ND, SY and NS data 
placed in classl, class2 and_ class3, 
respectively. Cooman’s plot has four parts. 
The data which is belong to two classes was 
placed in the lower left square. It means there 
is just a small distance from both models, 
while out of ranges’ data dispersed in the 
upper right square. Fig 2 shows the cooman’s 
plot of different sample groups with SIMCA 
model. For example in Fig. 2a, samples in the 
horizontal rectangle belong to class2 and the 
vertical rectangle on the left shows samples of 
classl. According to Fig. 2, NS samples had 


more confliction with SY in three studied data 
reduction methods (c, f and h) whereas ND 
and SY separated obviously. However, 
interference was rarely seen between NS and 
ND (Fig. 2b, e and g). According to Table 1, 
the 1‘ derivative preprocessed spectra had the 
better results in comparison with other 
methods. Therefore cooman’s plot was drawn 
based on SG. The performance of cooman’s 
plot was also approved by full differentiation 
of aroma components of wild strawberries 
(Negri et al., 2015). 


Table 1- SIMCA classification accuracy of non-preprocessing and some preprocessing methods for 
dimensionality reduction techniques in Visible-NIR region (%) 


Dimensionality @lisses Classification accuracy (%) 
reduction techniques Raw data 1° derivative SNV MSC 
ND 90.3 95.8 92.7 93.1 
NS 86.9 89.3 85.7 88.4 
io SY 93.1 91.6 94.0 91.6 
Total 96.4 98.0 97.4 96.94 
ND 85.9 92.0 88.2 91.3 
SAinomEnAppIne NS 81.4 83.9 81.5 83.3 
SY 90.5 90.6 91.7 89.1 
Total 92.4 95.0 93.8 93.94 
ND 88.5 94.7 90.9 92.2 
NS 83.9 89.3 83.8 86.5 
Mor SY 92.6 90.4 92.0 89.7 
Total 94.8 97.0 95.3 95.2 


Partial Least Square (PLS) 

PLS as a full spectrum method was 
employed for the analysis of FB in pear leaves. 
The calibration model was developed in order 
to evaluate test data. Then the first polynomial 
equation was predicted. PLS regression model 
using calibration and validation data of raw, 
SG, SNV and MSC spectra is shown in Fig. 3. 
The same accuracy was recorded between 
actual and predicted data for detection of palm 
oil adulteration in lard (Basri et al., 2017), and 
between predicted and reference values of 
tocopherol content in olive oil (Cayuela and 
Garcia, 2017). Based on the coefficient 
determination of PLS regression (Fig. 3), the 
calibration and validation of a PCA classifier 
on SNV and MSC spectra, was fitted in 
comparison with nonlinear techniques. 
According to the results of Table 1, 1" 


derivative and MSC are more accurate 
preprocessing methods. Three dimensionality 
reduction techniques and two preprocessing 
methods, including 1“ derivative and MSC 
was combined in order to distinguish NS 
leaves (Table. 2). The results are almost close 
in studying manifold learning based methods, 
whereas PCA prediction was more accurate. 
Lorente et al (2015) showed similar results on 
citrus (Lorente ef al., 2015). Based on the 
result, combination of MSC and PCA showed 
a better predictive model in comparison with 
Sammon mapping and MAE. In the most 
cases, the models for three types of 
dimensionality reduction techniques are 
completely strength and accurate because they 
were able to distinguish the NS leaves as 
members of SY class. 
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Fig.3. PLS regression model using calibration and validation data of a. raw, b. SG, c. SNV and d. 
MSC spectra 


Table 2- Prediction percent of NS as Diseased or Healthy leaves 


PCA Sammon mapping MAE 
Diseased NS Diseased NS Diseased NS 
1* Diseased 96.0 76.5 89.0 55.5 95.0 61.2 
derivative NS 23 22.3 11.0 42.1 3.1 36.8 
MSC diseased 98.0 75.9 94.0 62.2 97.0 71.4 
NS 2.0 24:0 5.0 32.9 1.6 29.0 
Conclusions According to the results, the maximum NS 


In the present research, the ability of 
spectrometry method for detection of FB 
disease of pear trees was assessed. For this 
purpose, the visible and NIR spectra of pear 
trees’ leaves were obtained with a 
spectrometer (200-1100 nm). In order to 
eliminate any inappropriate information, 
several spectral pre-processing techniques (1“ 
derivative SG, MSC and SNV) were then 
adjusted on the spectra. In order to detect non- 
symptomatic FB affected leaves in early stage, 
some manifold-based learning methods (PCA, 
Sammon mapping and MAE) were applied to 
transform the high-dimensional spectral data 
into significant representations of low- 
dimensional spectral data. 


classification accuracy (89.3%) was obtained 
by employing PCA and MAE on the derivative 
spectra. In the case of visible-NIR spectra, the 
second NS classification accuracy (88.4%) 
was acquired by employing PCA on the MSC 
correction spectra. From these results, it can be 
concluded that, the linear manifold learning 
technique for dimensionality reduction (PCA) 
is more accurate than the non-linear technique 
(Sammon mapping and MAE). Furthermore, it 
should be noticed that the 1“ derivative SG 
showed the most accurate classification results 
than the MSC and then SNV preprocessed 
spectra and also ND was discriminant from NS 
and SY in significant level of 5%. As reported 
by these results, the present research plans the 
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structure of an automatic commercial system classifier methods for early detection of FB in 
based on dimensionality reduction and near future. 
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